Towards a Generic Approach for Bilingual Lexicon Extraction from Comparable Corpora

نویسندگان

  • Dhouha Bouamor
  • Nasredine Semmar
  • Pierre Zweigenbaum
چکیده

This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the problem associated to polysemous words found in the seed bilingual lexicon when translating source context vectors. To improve the adequacy of context vectors, the use of a WordNetbased Word Sense Disambiguation process is tested. Experimental results on four specialized French-English comparable corpora show that our method outperforms two state-of-the-art approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora

This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the unresolved problem of polysemous words revealed by the bilingual dictionary and introduce a use of a Word Sense Disambiguation process that aims at improving the adequacy of context vectors. On two specialized FrenchEnglish comparable corpora, empiric...

متن کامل

Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora

Previous work on bilingual lexicon extraction from comparable corpora aimed at finding a good representation for the usage patterns of source and target words and at comparing these patterns efficiently. In this paper, we try to work it out in another way: improving the quality of the comparable corpus from which the bilingual lexicon has to be extracted. To do so, we propose a measure of compa...

متن کامل

Compositionnalité et contextes issus de corpus comparables pour la traduction terminologique (Compositionality and Context for Bilingual Lexicon Extraction from Comparable Corpora) [in French]

Compositionality and Context for Bilingual Lexicon Extraction from Comparable Corpora In this article, we study the possibilities of improving the alignment of equivalent terms monolingually acquired from bilingual comparable corpora. Our overall objective is to identify and to translate highly specialised terminology. We applied a compositional approach enhanced with pre-processed context info...

متن کامل

Word Co-occurrence Counts Prediction for Bilingual Terminology Extraction from Comparable Corpora

Methods dealing with bilingual lexicon extraction from comparable corpora are often based on word co-occurrence observation and are by essence more effective when using large corpora. In most cases, specialized comparable corpora are of small size, and this particularity has a direct impact on bilingual terminology extraction results. In order to overcome insufficient data coverage and to make ...

متن کامل

Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality. We then use state-of-the-art techniques to build a specialized bilingual lexicon from these sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013